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FOREWORD 


This  technical  note  suggests  methods  for  Incorporation  of  automatic 
and  semi-automatic  target  classification  techniques  Into  the  design  of 
advanced  active  sonar  systems.  The  discussion  results  in  a proposed  de- 
sign for  a classification  subsystem  that  Is  computationally  fast, 
adaptive,  and  provides  operationally  meaningful  Information  to  sonar, 
fire  control,  and  command  and  control  personnel. 

The  work  represented  herein  was  done  from  June  1967  through  December 
1968  under  direction  of  NUC  personnel  now  associated  with  NUC  Code  603, 
Simulation,  Analysis  and  Applications  Division.  It  was  sponsored  by 
NAVSHIFSYSCOM  (Code  00V2)  through  the  Conformal/Planar  Array  Program 
Project  Office  and  the  New  Submarine  Sonar/Fire  Control  System  Project 
Office.  Technical  assistance  was  obtained  from  Computer  Applications, 
Incorporated  (CAI) , under  Contract  N123(953)57400A. 

Only  a portion  of  the  total  effort  expended  Is  reported  In  this 
technical  note.  An  alternative  nonstatlstlcal  approach  was  also  pursued^  / 
The  results  of  this  latter  work  are  documented  In  NUC  Technical  Note  543.^ 

The  guidance  and  technical  contributions  of  R.  P.  Schindler,  now  of 
the  Naval  Electronics  Laboratory  Center,  are  gratefully  acknowledged,  as 
well  as  the  programming  support  provided  by  Mrs.  J.  Sentovlc  and  R.  T. 
Napier,  also  of  that  Center,  and  by  M.  Elnhom  of  Computer  Sciences 
Corporation. 
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Superscript  numbers  denote  references  at  end  of  report,  preceding 
the  appendices. 
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AM  APPROACH  TO 

TARGET  CLASSIFICATION  BY  COMPUTER 
IN  ADVANCED  ACTIVE  SONAR  SYSTEMS 


\ 

I.  INTRODUCTION 

Traditionally,  and  to  a large  extent  today,  the  responsibility  for  making 
a classification  decision  on  a detected  target  rests  with  the  sonar  operator. 
It  is  his  task  to  review  all  target-related  Information  received  through  the 
various  Inputs  available  to  him.  From  these  sources  he  must  extract  only  the 
information  he  considers  pertinent  and  then  correlate  this  information  before 
arriving  at  a classification  decision. 

The  need  is  great  for  the  generation  of  automated  techniques  to  assist 
the  sonar  operator  in  selecting,  extracting  and  correlating  pertinent  data. 
This  need  will  become  more  acute  In  the  complex  advanced  sonar  systems  which 


are  currently  under  development. 

The  present  investigation  Included  four  kinds  of  overlapping  activities: 

(1)  a search  of  current  literature  for  applicable  techniques,  (2)  development 
of  new  analytical  techniques  to  augment  existing  ones,  (3)  generation  of 
computer  programs  to  Implement  and  evaluate  these  techniques,  and  (4)  develop- 
ment of  a design  for  a complete  semi-automatic  classification  subsystem. 

The  general  classification  problem  is  formidable,  and  despite  the  fact  that 
the  approach  in  this  investigation  was  restricted  to  methods  realizable  in  a 
real-time  system,  the  results  are  necessarily  partial  and  the  recommendations 
tentative.  Nevertheless,  essential  ground  has  been  covered,  and  it  is  believed- 
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! the  observations  made  here  will  remain  valid  until  the  next  major  theoretical 


advances  In  classification  techniques  are  achieved.^  Also^many  of  the  topics 
discussed  In  this  note  may  be  of  Interest  to  those  concerned  with  applications 
of  general  computerized  learning  and  classification  methods  to  areas  beyond 
the  domain  of  advanced  digital  sonar  systems. 


II.  CLASSIFICATION  IN  ADVANCED  ACTIVE  SONAR  SYSTEMS 


A.  The  Class If lea tlon  Subsystem.  Advanced  submarine  and  surface  ship 
sonar  systems  now  under  Navy  development  are  being  designed  to  Incorporate 
computerized  data  processing  complexes  (DPC) . The  DPC  will  digitally  correlate 
the  Inputs  from  the  active  and  passive  sonar  sensors  and  combine  them  with 
external  Inputs  such  as  environmental  data,  own-ship's  status,  and  Intelligence 
Information.  High-speed  digital  processing  of  this  Information  will  allow  the 
system  to  carry  out,  with  varying  degrees  of  operator  assistance,  the  functions 
of  target  detection,  tracking,  classification,  and  threat  evaluation,  and  will 
automatically  provide  fire  control  solutions  and  weapon  settings.  In  addition, 
the  DPC  will  generate  graphic  and  digital  display  formata  for  sonar,  fire  con- 
trol, and  command  and  control  personnel. 

This  technical  note  deals  with  the  portion  of  the  DPC  that  performs  the 
target  classification  function  utilizing  the  active  sonar  returns . It  Is 
considered  a subsystem  of  the  DPC,  and  Is  subject  to  normal  system  constraints. 

For  Instance,  the  classification  subsystem  must  be  real-time  In  the  sense  that  It 
must  rapidly  process  Its  external  Inputs  and  output  results  before  data  from 
the  next  ping  arrive.  Since  a digital  computer  Is  the  heart  of  the  DPC,  the 
classification  computer  programs  must  be  designed  to  respond  first  to  Inputs  of 
high  priority  to  the  exclusion  or  deferment  of  lower  priority  functions.  Time- 
consuming  operations  and  those  not  requiring  fast  response  cannot  be  permitted 
to  Interfere,  and  are  accordingly  relegated  to  "background*  processing.  In 
addition  to  the  design  considerations  peculiar  to  a real-time  system,  this 
subsystem  shares  the  constraint  of  any  complex  computational  system  of  progr£uns  - 
that  a subprogram  may  not  consume  operating  time  or  core  memory  space  which  Is 
out  of  proportion  to  Its  Importance. 
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B.  Probabilities  of  Class  Membership.  The  computational  power 
available  in  the  DPC  permits  the  application  of  statistical  decision  theory 
to  the  problem  of  either  automatic  or  semi-automatic  (operator-assisted) 
target  classification.  Desirable  simplifications  can  be  made  in  the  statistical 
theory  if  we  ignore  a priori  probabilities  and  cost  functions.  This  can  be 
done  because  there  is  no  risk  Involved  in  ranking  targets  by  probability 
scalars,  as  opposed  to  deciding  that  they  belong  to  class  02*  • • ®n* 

The  classification  subsystem  outputs  are  in  the  form  of  likelihood  ratios 
or  probabilities  and  associated  confidence  levels.  The  likelihood  ratios 
can  be  expressed  as 


P(X|0i,) 

1-P(x|0k) 


where  P(x|0j^)  is  the  probability  of  event  X occurring  given  that  it  is  a 
member  of  class  0^^.  To  convey  information  to  an  operator  in  a more  meaningful 
fashion,  these  probabilities  can  be  expressed  directly: 


p(\lx) 


P(xl0j^)P(0^) 

all 

1 P(X|0k) 


(2) 


k-i 

where  X is  the  unknown  observation  and  P(x|0j^)  is  the  conditional  probability 
density  for  the  k^^  class.  Restated  in  these  terms,  the  classification 
problem  becomes  one  of  computing  P(0j^|X)  from  the  estimated  conditional 
probability  densities  P(X|0j^).  To  obtain  acceptable  estimates  of  these 
probability  densities,  we  should  observe  as  many  samples  from  each  target 
class  as  possible.  This  is  extremely  difficult,  especially  in  the  case  of 
obtaining  samples  of  actual  hostile  vessels  and  weapons. 
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It  Is  clear,  however,  chat  the  computer  must  have  some  estimate  of  each 
of  these  probability  densities  to  make  a classification.  These  estimates 
can  and  should  be  Improved  as  more  observations  are  made.  The  Initial  densities 
used  will  be  based  on  simulated  data,  the  generation  of  which  Is  a difficult 
problem  In  Itself.  Ideally,  the  Influence  of  the  simulated  data  will  gradually 
diminish  as  real  samples  become  avallsb  le , but  there  is  little  assurance  that 
real  samples  would  occur  frequently  enough  or  over  a sufficiently  representative 
range  to  obviate  the  simulated  data  complbyely. 


ij 

M 
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C.  Learning . The  procedure  which  forms  a representative  conditional 
probability  density  function  P(x|0j^)  for  a class  of  objects  from  past  obser- 
vations of  samples  is  referred  to  as  "learning".  The  most  thoroughly  analyzed 
and  tractable  application  of  statistical  learning  is  for  multivariate  normal 
distributions  (MND) . Learning,  for  the  MND,  Is  the  computation  of  the  sample 
cobarlance  matrix,  U,  and  the  sample  mean,  X,  as  estimates  of  the  parameters 
of  the  sample's  parent  distribution  (l,y).  This  exemplifies  "parametric 
learning"  where  the  form  of  the  underlying  distribution  is  assumed  and  its 
characteristic  parameters  are  estimated  with  diminishing  error  as  the  number 
of  samples  increases. 

Conversely,  non-parametric  or  distribution  free  learning  assumes  nothing 

about  the  form  of  the  distribution,  but  responds  directly  to  the  samples  as 

does  a histogram.  The  single  outstanding  non-parametric  multivariate  method 

2 

existent  is  the  Polynomial  Diifftriminant  Method  (PDM) . The  relative  merits 
of  the  parametric  and  non-parametric  methods  will  be  discussed  in  Section  III. 

The  major  difficulty  in  applying  the  learning  procedures  to  sonar  is 
that  of  obtaining  usable  quantities  of  representative  data  over  the  r^ulge  of 
circumstances  in  which  the  target  types  of  Interest  occur.  For  submarine 
targets, for  example,  these  data  include  pertinent  combinations  of  range, 
speed,  depth,  and  aspect  angle  for  surface,  bottom  bounce,  and  convergence 
zone  modes  of  transmission. 
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I D.  Adaptation  is  iots]cpr«s*d  as  the  process  of  modifying 

i 

the  originally  learned  probability  density  functions  In  response  to  the 
reception  of  additional  samples  of  known  target  classes . These  samples  have 
previously  been  classified  by  an  external  source.  If  the  new  samples 
are  weighted  the  same  as  samples  originally  learned,  the  final  result 
will  be  the  aatua  ar  if  all  the  samples  had  arrived  at  the  same  time.  Such 
an  elementary  scheme  Is  appropriate  only  to  data  that  are  time- Invariant, 

1 

I and  is  a special  case  of  adaptation.  The  Importance  of  adaptation  Is  apparent 

from  the  following  comments  on  the  characteristics  of  certain  target  types: 

1.  Submarines  will  provide  comparatively  few  samples.  It  would 


I 

I 


i 

I 
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therefore  be  necessary  to  reeain  sensitive  to  new  samples  while  retaining 
all  of  the  older  learned  Information.  Also,  the  real-time  between  observations 
of  submarine  targets  ordinarily  would  be  great,  suggesting  that  time-weighting 
should  be  discounted  for  submarine  samples. 

2.  Noise,  In  contrast  to  submarines,  would  provide  a far  more 
continual  supply  of  representative  target  samples  which  mlg\t  vary  gradually 


with  time  and  location.  Here,  a straightforward  time-weighting  scheme 
would  cause  the  Influence  of  older  samples  to  fade  and  be  supplanted  by  the 
newer  Information.  The  rate  at  which  the  weight  would  diminish  with  age  would 
be  controlled  by  a parameter  chosen  empirically.  The  relative  stability  of 
most  classes  of  targets  would  permit  adaptation  computations  to  take  place 
on  an  Infrequent  background  basis. 

3.  Other  target  sources  such  as  sea  mounts,  schools  of  fish,  whales, 
etc.,  fall  between  submarines  and  noise  In  the  rate  at  which  samples  will  be 
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available,  and  It  Is  another  problem  how  to  merge  these  sources  for  learning. 

The  point  here  Is  that  the  method  of  adaptation  has  to  be  consistent 
with  our  model  of  the  processes  we  are  studying;  1.  e.,  are  they  stationary, 
highly  unpredictable,  etc.?  From  the  point  of  view  of  simplicity,  we  would 
like  the  adaptation  algorithm  to  be  In  parametric  form  and  not  Involve 
elaborate  computations. 


E.  Classification . The  act  of  making  a target  classification  decision 
is  too  crucial  and  the  cost  of  a false  alarm  too  high  to  permit  this  function 
to  be  completely  automatic.  The  operator  with  the  responsibility  of  making  a 
target  classification  decision  will  have  to  act  on  his  own  judgment,  based  on 
the  automatic  estimates  of  the  target  class  probabilities  and  tempered  by 
I all  available  active  and  passive  tracking  data, 

r 

f 

; For  each  ping's  worth  of  new  data,  the  classification  subsystem  will 

automatically  assign  class  probabilitles>P(0j^|xi,  for  newly  detected  targets 

and  for  those  which  are  currently  being  actively  or  passively  tracked. 

Ideally,  the  classification  subsystem  would  unerringly  label  each  target 

with  the  correct  0 . Realistically,  the  best  that  can  be  done  is  to  estimate 

k j 

the  likelihood  that  the  unknown  target  is  from  0^  and  also  estimate  a confi-  j 

dence  level  for  the  likelihood  ratio.  These  variables  may  be  used  conveniently  | 

■ ; j 

to  rank  the  relative  importance  of  each  target  as  suggested  by  Table  1.  I 


TABLE  1.  TARGET  RELATIVE  IMPORTANCE  AS  A FUNCTIOH  OF 

HOSTILE  TARGET  LIKELIHOOD  RATIO  AND  CONFIDENCE 
IN  LIKELIHOOD  RATIO 


Hostile  Target 
Likelihood  Ratio 

Confidence  in 
Likelihood  Ratio 

Target  Relative 
Importance 

Low 

High 

Least  important 

Low 

Low 

Important 

High 

Low 

Important 

High 

High 

Most  Important 

1, 

!• 

Appendix  A discusses  a procedure  which  could  be  used  to  evaluate  relative 
target  importance  as  suggested  by  Table  1. 
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Classification,  in  this  context.  Involves  the  evaluation  of  all  the 
P(x(Ol^^  s for  some  selected  number  of  targets  each  ping,  and  then,  from 


Equation  2,  computing  the  desired  scalar  class  membership  probabilities, 

'PCQjjlx),  for  these  targets.  The  classification  subsystem,  therefore,  serves 
primarily  to  give  a ranking,  in  terms  of  a scalar  quantity,  P(Gj(^|x)  which 
may  be  used  to  assist  the  operator  in  making  his  classification  decision  or 
may  be  thresholded  to  provide  an  automatic  alarm.  The  latter  approaches  a 
totally  automatic  classification  subsystem. 

The  classification  function  exemplifies  a foreground,  priority  function 
of  a real-time  data  processing  system  due  to  the  frequency  of  its  operation 
and  to  the  fact  that  the  sonar  platform  may  well  be  in  a race  with  a hostile 
computer  to  establish  a fire  control  solution. 

i 
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III.  TECHNICAL  DISCUSSION 

A.  Vector  Representation , the  Meas’irement  Space . For  each  ping's  worth 
of  available  information,  target  observations  are  made  which  consist  of  extract- 
ing d ordered  measurements.  These  measurements  can  be  represented  by  d ordered 
E 

real  numbers  (x^^,  x^,  . . . ,x^,  . . . ,x^)  corresponding  to  d-dimensional 

, vectors,  X,  or  points  in  Euclidean  d-space  which  we  will  refer  to  as  the 

I 

I ^hneasurement"  space.  This  representation  is  necessary  for  a mathematical  treat- 

ment and  is  merely  an  extension  of  the  familiar  Cartesian  Coordinates. 

Suppose  that  m active  sonar  echoes  from  a target  are  observed  and  three 
measurements  are  taken  from  each  echo:  Range  (R) , amplitude  (A),  and  doppler 
(D) . Table  2 illustrates  both  a descriptive  notation  and  the  generalized  nota- 
tion which  will  be  used  throughout  the  remainder  of  this  note. 


TABLE  2.  VECTOR  NOTATION 


Descriptive  Notation 


LI 

[A 

D 

OBSERVATION 

Dimension 

2nd 

, 1®^ 

^1 

B 

B 

rttid 

2 

^2 

^2 

°2 

• 

• 

• 

th 

j 

• 

• 

• 

• 

• 

• 

im 

th 

m 

R 

m 

A 

m 

D 

m 

Generalized  Notation 


3(-d) 


11 


"12 


'■13 


'21 


22 


'23 


J1 


J2 


'J3 


ml 


m2 


md 


R--  range,  A > ^unplltude,  D - 4(lfpXer 


I 


i 

i 


i 
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B.  PlscrlmiaMif  Functions.  The  possibility  of  using  multiple  linear 
surfaces  (discriminant  functions)  to  compartmentalize  the  measurement  space 
was  considered  but  found  to  be  an  unsuitable  approach  for  the  following 
reasonii 

1.  Specifically  separating  surfaces  are  sensitive  only  to  the 
available  known  samples  and  are  susceptible  to  error  when  used  to  classify 
unknowns . 


2.  The  problem  does  not  call  for  classification  decisions,  but 

for  relative  estimates  that  an  unknown  target  Is  a submarine,  torpedo,  etc. 

3.  The  direct  use  of  separating  surfaces  does  not  provide  a plausible 
mechanism  for  estimating  the  desired  likelihood  ratios  and  confidence  levels. 
Of  course,  artificial  measures  of  the  likelihood  ratios  and  confidence  levels 
could  always  be  defined  In  terms  of  distances  from  surfaces  In  an  arbitrary, 
artificial  fashion. 

4.  Finally,  surfaces  determined  from  a statistical  foundation  can  be 
made  to  classify  at  least  as  well  as  those  from  class-separating  algorithms 
by  comparing  the  likelihood  t atlo  with  a constant  (C) . For  example  (and 
Ignoring  considerations  of  costs  and  a priori  probabilities),  decide: 


P(xl0  ) 

Xe0  if  — > C 

P(X|02) 


(3) 


Or  from  equation  2 decide: 

Xca  if  P(XlQ)  > < 

^ ^ (C  + 1) 

Since ;tb*  functttjOM  P(x|0^)  and  P(x|02)  are  unrestricted^  the  resulting 
decision  regions  can  become  very  complex.  It  should  be  pointed  out  that 
overly  complex  surface  stnicturee  are  not  only  hard  to  use,  but  may  well 
give  results  Inferior  to  those  obtained  with  simpler  surfaces. 
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C.  Clusters  and  the  Shape  of  Sample  Distributions.  In  the  statis- 


tical approach  selected  for  this  analysis,  known  vectors  can  be  considered 
as  samples  from  unknown  parent  distributions.  These  samples  may  be  visual- 
ized as  comprising  clusters  of  points  in  the  familiar  three-dimensional 
space  of  our  experience.  Thus,  sample  populations  from  a given  source 
class  may  be  described  in  general  qualitative  terms  such  as  sparse,  dense, 
locally  dense,  ellipsoldally  symmetric,  homogeneous,  etc.  More  complicated 
cluster  configurations  are  more  difficult  to  describe  verbally  or  in  terms 
of  mathematical  parameters.  The  problem  of  efficient  description  of  an 
entire  sample  set  can  be  greatly  alleviated  if  the  data  are  divided  into 
easily  descrlbable  subsets.  Such  attempts  are  discussed  later  in  this  section. 
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D.  Sample  Representation.  If  a large  number  of  sample  observations 
have  been  taken  on  a target,  it  would  not  be  practical  to  store  all  the  samples 
explicitly  for  later  reference.  One  method  of  alleviating  this  problem  would 
be  to  decimate  the  data,  but  this  could  lead  to  an  undesirable  loss  of  infor- 
mation in  cases  where  the  residual  samples  are  not  representative  of  the 
discarded  samples.  Also,  the  data  that  can  be  discarded  without  undue  com- 
promise depends  upon  the  quality  and  amount  of  the  data.  Therefore,  the 
motivation  to  reduce  the  requirement  for  reference  data  must  be  subordinate 
to  the  problem  of  faithfully  representing  the  "shape"  of  any  clusters  contained 
In  the  data.  In  this  note  all  numerical  attempts  at  data  representation  are 
be  referred  to  as  "cluster  analysis". 

The  literature  on  cluster  analysis  Is  extensive,  but  very  little  Is 
applicable  to  cluster  analysis  In  sonar  classification.  The  best  computer- 
oriented  cluster  finding  technique  uncovered  In  the  literature  seems  to  be 

3 

ISODATA.  Successful  experiments  have  been  carried  out  at  this  Center  with  the 

ISODATA  technique.  However,  existing  cluster  analysis  methods  appeared  to 

have  too  many  Inherent  risks  and  deficiencies.  For  example,  methods  which 

Introduce  the  data  sequentially  are  sensitive  to  the  order  In  which  the 

data  are  Introduced.  Also,  the  picture  can  change  precipitously  when  certain 

parameters  are  marginal;  e.  g.,  the  parameter  that  governs  whether  two  or 

more  clusters  should  be  combined  Into  one  cluster.  The  validity  of  some 

other  methods  depends  upon  the  shape  of  clusters.  For  Instance,  the  histogram 

4 

method  of  G.  Sebestyen  favors  ellipsoidal  shapes. 

These  shortcomings  were  deemed  unacceptable  because  of  our  overall 
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I 

I 

objective  of  recommending  techniques  which  would  perform  reasonably  well 
i over  a broad  range  of  cluster  configurations.  Accordingly,  a method  which 

would  regard  all  data  simultaneously  and  fix  the  number  of  clusters  as 
, unequlvocably  as  possible  was  sought.  A vector  field  approach  of  considerable 

promise  was  developed  and  its  feasibility  was  verified  graphically  by  programs 
using  the  CALCOMP  plotter  and  a computer-driven  display.^ 

I 

The  evaluation  of  the  efficacy  of  a cluster-seeking  method  should  not 
be  merely  subjective.  A step  toward  more  objective  conparisons  has  already 
been  made  in  the  generation  of  a few  "standard"  sets  of  data.  We 
feel  that  these  clearly  useful  attempts  are  nevertheless  incapable  of  meeting 
an  essential  objection:  a truly  general  cluster  analysis  technique  must 
demonstrate  itself  over  the  applicable  range  of  aample  sizes,  dimensions,  and, 
most  important  of  all,  data  configurations.  We  approached  this  problem  by 
using  Monte  Carlo  techniques  where  the  data  sets  were  generated  randomly. 
Because  this  implied  a lack  of  complete  control  over  the  test  data,  an  auto- 
matic and  objective  way  of  measuring  cluster  finding  performance  was  called 
for.  Appendix  B contains  suggestions  for  objectively  measuring  this  perform- 
ance. 

It  should  be  emphasized  that  a rigorous  test  of  a cluster  analysis 
technique  is  necessary  before  it  can  be  considered  ready  for  operational  use. 
The  test  data  should  be  designed  to  produce  significant  problems  in  contrast 
to  the  well-separated  clusters  often  used  to  illustrate  techniques.  As  a step 
toward  the  generation  of  test  data,  a program  was  written  which  makes  use  of 
randomly  generated  covariance  matrices.  This  program  implements  suggestions 
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of  T.  P.  Norris  (NUC)  and  G.A.  Butler  (CAI)  and  carries  out  the  following 


steps : 

1.  Randomly  selects  d real  vectors  of  d components  each.  These 
are  a basis  if  they  are  chosen  to  be  linearly  independent. 

2.  Does  a Gram-Schmidt  orthogonalization  process  using  those 

d vectors.  The  resulting  orthonormal  set  is  then  arrayed  in  the  d x d matrix,  W. 

3.  A scalar  matrix,  S with  diagonal  elements  ^2*^2  ‘ '^d* 
greater  than  zero,  is  chosen  randomly. 

4.  The  desired  covariance  matrix  is  then  computed  by  the  similarity 
transformation 

U = SW  (5) 

where  U is  a positive  definite  matrix  with  a determinant  equal  to  A^. 

If  U corresponds  to  a d-variate  normal  distribution,  we  could  say  the 
A's  determine  its  shape  and  spread,  and  W determines  its  orientation.  The  idea 
is  that  test  clusters  of  samples  generated  by  the  distribution  would  tend  to 
have  the  same  shape  and  orientation.  Another  program,  TDATA,*  generates  a 
specified  number  of  samples  from  U with  an  assumed  mean  of  zero.  We  have 
the  ability  to  produce  random  cluster  systems  by  generating  the  union  of  a 
number  of  more  or  less  ellipsoidal  clusters  whose  means  would  be  chosen  to 
ensure  overlap.  The  overlap  in  turn  gives  rise  to  larger  clusters  of  more 
complex  shapes  which  cluster  analysis  techniques  would  have  to  resolve  into 
their  simpler  components. 


*TDTA  was  borrowed  from  L.  Traister  of  CAI,  and  modified  by  R.  Napier 
and  J.  Roese  of  NUC. 
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E.  Measurement  Selection.  We  have  not  attempted  to  resolve  the  diffi- 
cult question  of  how  many  measurements  should  be  taken  on  a detected  target. 


It  is  true  that  the  amount  of  classification  information  does  not  decrease 
when  new  measurements  are  added  and  are  not  found  to  be  helpful.  However, 
if  it  cannot  be  shown  that  each  new  measurement  is  independent  of  all  the 
others,  it  is  a costly  effort  to  find  out  just  how  much  information  it  is 
contributing.^  Aside  from  the  theoretical  arugument  against  too  many  measure- 
ments, the  fact  is  inescapable  that  an  increase  in  dimensionality  will  mean 
non-linear  increases  in  computer  storage  and  processing  time.  At  this  point 
we  can  only  suggest  that  measurements  be  chosen  which  we  know  to  be  most 
relevant  to  the  physics  of  the  situation.  This  suggests  that  each  mode  of 
active  sonar  transmission  (surface  duct,  bottom  bounce,  convergence  zone) 
should  have  its  own  set  of  measurements. 
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F.  Statistical  Learning.  Learning,  In  the  context  of  this  problem, 
means  the  estimation  of  all  of  the  conditional  probability  density  functions 
P(x|Gj^)  of  the  target  classes.  It  will  have  to  be  assumed  that  the  sample 
data  from  which  the  system  learns  Is  representative;  that  Is,  that  the  rela- 
tive number  of  samples  In  a unit  volume  of  the  measurement  space  Is  suggestive 
of  the  probability  density  there.  As  was  Indicated  earlier.  It  Is  extremely 
unlikely  that  there  will  ever  be  enough  real  samples  to  meet  this  requirement; 
therefore,  the  use  of  simulated  data  from  elaborate  models  Is  almost  Inevitable. 
By  breaking  up  the  available  samples  Into  clusters  of  more  or  less  convex  shape. 
It  Is  assumed  that  each  cluster  consists  of  samples  from  a local  d-varlate 

normal  distribution.  Learning  Is  then  reduced  to  computing  the  sample  mean 

Ic 

vector,  X , and  sample  covariance  matrix,  U , for  each  cluster.  By  definition 
these  computations  are  as  folic vs: 


1 


For  the  purpose  of  efficient  computation,  the  means  of  the  measurements 
would  be  computed  first;  then  the  elements  of  the  sample  covariance  matrix 
can  be  computed  quickly  using  Equation  10: 


rs 
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(10) 


It  should  be  noted  that  the  simple  computations  of  Equations  (6)  and  (10) 
are  all  that  Is  needed  to  estimate  the  parameters  of  the  d-varlate  normal 
distribution  which  has  the  form  of  Equation  (11) : 


-1  T 

P(x|0  ) - (2Tr)”'*^^lu‘^r^^^exp[-l/2(X  - X*')  (X  - X®^)  ] 


(11) 


The  circumflex  reminds  us  that  Equation  (11)  is  an  approximation  to  some 
underlying  probability  density  whose  form  we  have  not  literally  assumed, 
but  which  we  hope  is  not  too  unlike  the  normal  case. 

The  symmetrical  nature  of  the  quadratic  form  in  the  exponent  of 
Equation  (11)  permits  a computational  shortcut  with  significant  savings 
as  shown  below. 

Let 
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The  derivation  of  the  multivariate  normal  distribution  and  the  optimality 

of  the  sample  mean  and  sample  covariance  matrix  estimators  are  thoroughly 

7 8 

developed  In  the  literature.  ’ 

The  computation  of  the  likelihood  ratio  also  has  desirable  simplicity. 
If  the  likelihood  ratio,  t.  Is  concerned  with  class  h versus  class  k,  then 

P(x|0  ) |u^|^^^exp(-l/2Q^) 

I . S—  . - (14) 

P(x(0j^)  |irr'^exp(-l/2Q*') 

It  Is  more  convenient  to  deal  Instead  with  the  natural  logarithm  of  1 
which  Is,  of  course,  monotonlc  with  respect  to  Z. 

log  I - l/2(log  |U*^|  - - log^lu^l  + q’*')  (15) 

6 0 


loggl  « (logglu^l  - log^lu**!  + q'"  - Q^*) 


(16) 


The  last  equation  shows  that  the  evaluation  of  quadratic  forms  Is  all  that 
Is  required  to  deteinnlne  £,  since  the  logarithm  of  the  deteimlnants  would 
be  stored  as  slow-changing  parameters.  The  above  equations  have  been 
programmed  at  this  Center  and  successfully  evaluated  for  two-dimensional 
sets  of  data. 

A serious  practical  problem  arises  when  the  observed  unknown  has  one 
or  more  missing  or  very  noisy  measurements.  Some  effort  was  expended  on  an 
analytical  solution  to  this  problem,  and  this  appears  In  Appendix  C.  It 
has  been  suggested  that  the  best  that  can  be  done  Is  to  Ignore  the  missing 
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dimension  entirely;  that  Is,  deal  with  a subspace.  For  the  sake  of  practicality, 
we  suggest  that  the  unacceptable  measure,  x^,  be  replaced  by  the  following 
weighted  Interpolation: 


(u^i»-)xi^' 

(Uii*^)  + (u^^^ 


(17) 


This  Interpolation  Is  an  attempt  to  force  x^  Into  a position  between  the  means 
of  the  classes  k and  h so  as  not  to  bias  the  quadratic  form  toward  one  class 
or  the  other.  All  of  the  parameters  of  Equation  (17)  are  readily  available  from 
the  sample  means  and  sample  covariance  matrices. 


G.  Confidence.  It  is  Important,  for  operational  purposes,  to  have 
some  measure  of  the  level  of  confidence  In  the  likelihood  estimate  based 
on  an  unknown  X.  This  Is  a very  difficult  task  which  has  never  been  done, 
to  our  knowledge,  even  for  the  case  where  the  multivariate  normal  assumption 
is  made.  There  is  no  way  of  knowing,  or  expressing,  the  condition  where 
the  probability  estimates  are  in  error  due  to  learning  data  which  is  not 
completely  representative;  i.  e. , does  not  exist  in  all  regions  of  the 
Space  in  which  unknowns  might  occur.  For  this  reason  the  statistical 
confidence  measures  described  here  are  simply  a function  of  the  number  of 
available  samples.  Statistical  confidence  is  expressed  as  the  probability 
that  a sample  from  a random  variable  falls  within  a given  range  of  the  random 
variable.  It  was  not  possible  to  give  a complete  analytical  expression  of 
statistical  confidence  without  making  strong  simplifying  assumptions  about 
the  forms  of  the  probability  density  functions  of  the  active  sonar  targets 
of  interest;  even  then,  the  analytical  problems  were  severe. 

We  took  two  approaches  to  estimating  confidence  levels;  fl.)  Monte 
Carlo  techniques , and (2)  intuitive  technique  with  some  analytical  basis. 

The  Monte  Carlo  approach  chosen  was  based  upon  the  d-varlate  normal  fit 
method  of  probability  density  estimation  using  clusters  of  more  or  less 
descrlbable  shapes  (ellipsoidal  and  rectangular).  The  validity  of  this 
error  analysis  therefore  applies  only  to  cluster  analysis  methods  which 
control  the  shape  of  the  resulting  clusters.  Where  the  cluster  shapes 
are  not  controlled  or  are  only  partially  controlled,  the  idea  of  a confidence 

J 

! 

measure  for  the  d-varlate  normal  fit  becomes  meaningless;  l.e.,  our  I 
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confidence  would  always  be  very  low.  The  Monte  Carlo  attempt  was  not  completed 
due  to  time  restrictions;  however,  the  following  procedure  for  analyzing  the 
problem  was  defined: 

1.  Randomly  choose  a covariance  matrix,!^  ^ ^ 

2.  Randomly  generate  a number  of  samples,  M,  from  (0,T)  using 
the  TDATA  Program. 

3.  Compute  a sample  covariance  matrix,  U,  and  sample  mean  X 
from  the  M samples. 

A.  Using  Equation  (11),  compute  P(X)  for  a variety  of  points  {X}. 

5.  Using  Z (the  "true"  covariance  matrix)  and  a mean  of  zero 
(the  "true"  mean),  use  Equation  (11)  to  get  the  "true"  P(X)  values. 

6.  Compute  the  relative  error  (P(X)-P(X)yP(X)  and  other  error 
functions  for  each  X. 

7.  Record:  d,  M,  |u|,  Q,  and  the  error  functions. 


8.  Repeat  Steps  2-7  some  number  of  times  and  then  go  back  to 


Step  1. 


The  approach  here  was  to  treat  the  relative  error  as  a random  variable 
which  was  a function  of  the  other  random  variables  d,  M,  |u|,  and  Q.  A 
FORTRAN  program  was  written  to  carry  out  these  steps,  but  was  never  completely 
debugged.  The  ultimate  Intent  was  to  fit  the  surfaces  of  relative  error 
and  other  error  descriptions  as  a function  of  d,  M,  |u|,  Q.  This  would  have 
allowed  an  error  and  confidence  level  approximation  for  each  X with  some 
experimental  justification.  Similarly,  a Monte  Carlo  error  analysis  could 
also  be  made  when  the  cluster  samples  are  used  to  generate  other 
probability  density  function  under  evaluation. 


In  the  case  of  PDM  the  polynomials  would  be  prechosen  as  the  source 
probability  density  functions.  Confidence  In  this  case  could  be  defined 
as  a function  of  the  number  of  learning  samples,  the  degree  of  the  polynomial, 
and  the  number  of  dimensions. 

The  second  approach  was  to  view  the  problem  In  a stylized  way  so  that 
some  helpful  analysis  might  be  applied.  We  made  an  assumption  that  the  multi- 
variate normal  function  represented  the  shape  of  the  "true"  probability  density 
function  well  enough  (most  samples  in  the  cluster  near  *'hp  mean,  diminishing 
to  a very  few  at  the  edges).  The  second  assumption  was  that  the  only  error 
was  in  the  location  of  the  unknown,  X,  with  respect  to  X.  In  this  case, 
the  relative  error  can  be  approximated  by 

relative  error  = AX  . grad  P(X) 

P(X) 

where  AX  denotes  an  error  in  the  distance  to  the  mean.  The  derivation 
of  the  gradient  is  given  in  Appendix  D.  To  estimate  AX,  one  may  make  use 
of  the  fact  that  the  variance  of  the  sample  mean  of  a normal  population  is 
1/M  of  the  population  variance.  This  allows  the  following  extension: 

AX  - (19) 

where  |u|  is  the  determinant  of  the  sample  covariance  matrix  of  M samples 
in  d-dlmenslons . Once  again,  there  was  insufficient  time  to  test  the 
feasibility  of  this  approximation  on  the  computer,  and  it  is  included  here 
as  being  suggestive  of  a possible  direction  for  future  work. 
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H.  Number  of  Parameters.  The  parameters  that  the  system  has  to 

learn  using  a gausslan  fit  for  each  cluster  in  a measurement  space  of  d- 

dimensions  are  the  d-components  of  the  sample  mean,  the  d-variance  elements, 

and  the  d(d  - l)/2  covariance  elements  of  the  sample  covariance  matrix.  This 
2 

is  a total  of  (d  + 3d)/2  parameters  that  would  have  to  be  retained  for  each 
cluster.  Table  3 shows  the  number  of  parameters  for  a range  of  dimensions 
(d)  and  number  of  clusters  (n). 

Note  that  the  tabulated  values  are  equal  to  n(d^  + 3d)/2. 

TABLE  3 

NUMBER  OF  MULTIVARIATE  NORMAL  LEARNING  PARAMETERS 

Number  of  Clusters  (n) 


1 

2 

3 

4 

5 

6 

7 

5 

20 

40 

60 

80 

100 

120 

140 

6 

27 

54 

81 

108 

135 

162 

189 

7 

35 

70 

105 

140 

175 

210 

245 

Number 

of 

8 

44 

88 

132 

176 

220 

264 

308 

Dimen- 

sions 

9 

54 

108 

162 

216 

270 

324 

378 

(d) 

10 

67 

130 

195 

280 

320 

390 

455 

11 

77 

154 

231 

308 

385 

462 

539 

12 

90 

180 

270 

360 

450 

540 

630 

13 

104 

208 

312 

416 

520 

624 

728 

14 

119 

238 

357 

476 

595 

714 

833 

15 

135 

270 

405 

540 

675 

810 

945 

16 

152 

304 

456 

608 

760 

912 

1064 
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1.  The  Polynomial  Discriminant  Method  (PPM) . The  Polynomial  Discriminant 
Method  is  Introduced  here  as  the  exemplar  of  the  non-parametric  statistical 
approaches  to  classification.  It  was  planned  earlier  in  this  project  to  compare 
the  PPM  with  the  cluster  analysis /multivariate  normal  fit  approach 

discussed  in  preceding  sections.  As  with  certain  other  portions  of  the 
investigation,  there  was  Insufficient  time  to  Implement  PPM  on  a computer 
and  carry  out  such  a comparison. 

The  following  paraphrases  the  Important  features  of  PPM  and  compares  it 
with  the  cluster  analysls/multlvarlate  normal  fit  approach.  In  the  PPM,  the 
approach  is  to  view  each  sample  as  independently  representing  a local  parent 
density  whose  form  la  a spherically  symmetrical  normal  function.  The  overall 
parent  density  is  just  the  averaged  sum  of  these  functions  expanded  principally 
as  a polynomial.  The  spread  of  each  Individual  contribution,  o,  can  be  adjusted 
to  compensate  fot  bhe  Pbumpy"  density  which  arises  from  small  numbers  of  samples. 
The  PPM  work4  roughly  as  follows: 

1.  The  learning  algorithm  requires  that  each  sample  be  used  one 

at  a time,  so  there  is  no  need  to  store  each  sample  after  it  has  been  observed. 
The  same  is  true  of  CA/MNF,  Inasmuch  as  nothing  nore  than  averaging  is  involved. 

2.  The  algorithms  for  calculating  the  polynomial  coefficients  are 

simple,  as  is  also  the  case  with  calculating  sample  means  and  sample 
covariance  matrices, 

3.  The  shape  of  the  polynomial  density  function  can  be  made  as 
complex  or  simple  as  desired  by  adjusting  a,  the  spread  parameter.  With 
CA/MNF,  the  density  function  is  a union  of  ellipsoidal  shapes.  While  the 
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latter  can  get  complex  when  there  are  many  clusters.  It  Is  restricted  as 
compared  to  the  PDM. 

4.  When  the  PDM  Is  used  for  classification,  the  surfaces  of  equal 
likelihood  can  be  strictly  linear  or  highly  non-linear,  depending  on  o. 

Such  surfaces  are  always  second  degree  for  CA/MNF. 

5.  The  PDM  will  work  with  only  one  sample.  It  is  necessary  to 
have  at  least  d samples  to  calculate  a non-singular  d x d sample  covariance 
matrix. 

6.  Theoretically,  the  PDM  does  not  require  any  preliminary  cluster 
analysis.  The  CA/MNF  has  no  meaning  for  unclustered  data,  and  may 

still  work  poorly  unless  the  cluster  is  more  or  less  convex. 

7.  The  number  of  polynomial  coefficients  required  Increases 
geometrically  with  the  sum  of  the  degree  and  number  of  dimensions  (variables) . 
The  basis  for  a comparison  of  the  number  of  PDM  coefficients  with  the 

number  of  CA/MNF  parameters  is  important  enough  to  be  developed  in  the 


following  section. 


J.  Coap«ri»on  of  Nuaber  of  Learning  Parameters.  In  the  case  where 


each  cluster  in  a sample  set  corresponds  to  a mode  In  the  parent  density, 
the  dsgree  of  the  polynomial  necessary  to  represent  the  parent  density 
must  be  at  least  one  greater  than  the  number  of  modes.  Where  r Is  the  degree 
of  the  polynomial,  n Is  the  number  of  modes,  and  d Is  the  number  of  dimensions, 
the  number  of  terms  in  the  polynomial  is: 

nianber  of  terms  > 


(r  + d)l 
r!  d! 

(n  1 d): 

(n  + 1)  ’ d: 


(20) 


(21) 


Table  4 gives  the  number  of  terms  for  a range  of  n and  d for  comparison 
with  Table  3.  The  proper  interpretation  of  the  tables  is  not  that  the  number 
of  terms  in  PDM  is  prohibitively  large,  as  one  might  be  tempted  to  think. 

Note  that  for  the  case  where  there  is  only  one  mode,  the  number  of  parameters 
is  virtually  the  same  as  for  the  CA/MNF.^-:  Ue  feel  that  it  would' be 
possible  to  get  the  best  results  by  combining  cluster  analysis  with  PDM; 

1.  e.,  fit  polynomial  density  functions  to  clusters  rather  than  attempt  to 
use  the  entire  set  of  learning  data  to  generate  a single  polynomial. 


IV. 


TARGET  CLASSinCATIOM  SUBSYSTEM  DESICM 


Figure  1,  a functional  block  diagram,  depicts  the  principal  elements 
of  a target  classification  subsystem  as  it  might  be  implemented  within  the 

data  processing  and  display  complex  of  an  advanced  active  sonar  system.  This 
subsystem  design  incorporates  the  ideas  and  techniques  which  appear  io  be 

most  promlBlng,  and  at  the  same  time  arq  compatible  with  total  system  ' 
constraints.  For  convenience,  the  elements  of  the  target  classification  sub- 
system are  grouped  into  the  three  subfunctions  of  preprocessing,  learning 
and  adaptation,  and  classification.  These  subfunctions  are  discussed  in  order. 

A.  Preprocessing.  The  preprocessing  subfunction  embraces  the 
digital  processes  that  reduce  the  inputs  from  the  target  detection  and  tracking 
programs  to  vector  representations  useful  to  the  other  subfunctions.  The 
measurement  extraction  programs  produce  a finite  set  of  target  measurements 
for  each  new  look  at  a target.  Varying  degrees  of  extraction  and  processing 
will  be  required  to  obtain  this  set  of  measurements.  For  example,  target 
speed  will  be  available  directly  from  the  tracking  program, whereas  target 
aspect  angle  and  depth  will  require  some  computations.  The  measurement 
extraction  program  is  designed  to  reduce  the  quantity  of  target  data  avail- 
able and  represent  these  data  in  a manner  which  emphasizes  the  features 
that  distinguish  targets  of  Interest. 

When  the  desired  measurements  are  extracted  they  will,  in  general, 
be  in  different  units.  Normalization  of  the  measurements  can  be  accomplished 
by  dividing  by  their  respective  standard  deviations,  a procedure  which  is 
equivalent  to  a scalar  transformation.  After  scaling,  it  may  be  desirable 


LPIARNIKG  AND  ADAPTATION 


32 


FIGUEE  1.  ADVANCED  ACTIVE  SONAR  SYSTEM  DATA  PROCESSING  AND  DISPLAY  COtIPT.K’X 
INCIAJDING  TARGET  CLASSIFICATION  SUBSYSTBM. 


either  to  transform  these  vectors  into  a more  propitious  coordinate  system 
by  performing  an  eigenvalue  analysis,  or  to  reduce  the  dimensionality  by 
doing  a suitable  mapping.  The  procedures  which  take  the  extracted  norma- 
lized measurements  and  produce  the  vectors  {X}  to  be  learned  or  classified 
are  termed  "vector  transformations"  in  the  figure. 

B.  Learning  and  Adaptation.  Learning  (Swlthh  B "UP"  of  Fig.  1) 
occurs  when  the  subsystem  is  exposed  either  to  artificial  data  (Switch  A "UP) 
or  to  real  sonar  data  (Switch  A "DOWN").  In  either  case,  when  a vector  is 
presented  for  learning,  its  class  membership  must  also  be  given.  This  new 
vector  is  Included  in  the  data  base  for  known  vectors  prior  to  re- initialization 
of  the  cluster  analysis  routines  which  recompute  (learn)  new  parameter  values 
or  adapt  existing  ones.  Note  that  the  idea  of  adaptation  also  includes  the 
special  case  of  one-shot  learning;  this  is  essentially  adaptation  from  a 
state  of  complete  Ignorance.  In  this  sense,  "learning"  and  "adaptation"  can 
be  used  Interchangeably. 

The  element  referred  to  in  Fig.  1 as  Parameter  Learning  and  Adaptation  is 
completely  geaeral  because  the  learned  parameters  could  be  coefficients  of 
a polynomial,  the  parameters  of  a d-variate  normal  distribution,  or  the 
parameters  of  any  other  function  that  describes  class  probability  densities. 
Whatever  their  form,  the  most  up-to-date  values  for  the  parameters  are  stored 
in  high-speed  core  memoryfor  Immediate  use  in  evaluating  the  likelihood 
ratios  and  confidence  levels  for  unknown  vectors. 

The  function  of  storing  a data  base  of  known  vectors  Is  Important  as 
it  retains  the  Information  required  to  perform  meaningful  cluster  analysis 


on  a background  basis.  Also,  vectors  from  tracks  that  are  as  yet  unclassified 
must  be  ratilned  until  the  operator  classifies  them.  These  vectors  could 


! 


be  stored  temporarily  in  the  data  base  as  unknowns.  The  allocation  of  space 
for  these  vectors  would  depend  on  operational  data  rates  and  the  amount  of 
storage  available. 

C.  Classification.  In  a completely  automatic  classification  subsystem, 
the  conditional  probability  of  membership  in  the«'several  classes  could  be 
estimated  using  the  current  parameter  values  and  then  perhaps  combining 
with  cost  functions  and  ^ priori  probabilities  to  produce  a classification 
decision.  The  absence  of  reliable  cost  and  a priori  information  in  actual 
operational  situations  are  among  the  reasons  why  the  ultimate  responsibility 
for  target  classification  still  remains  with  the  sonar  operator.  In  Figure  1 
the  automatic  output  of  the  classification  subfunction  is  depicted  as  an 
independent  input  to  the  sonar  operator.  This  operator  must  continually  exercise 
his  own  judgment  based  on  the  total  of  the  information  received  from  the  target 
classification  subsystem,  , the  processed  outputs  from  the  passive  sensors, 

passive  and  active  track  histories,  intelligence  information,  environmental 
data,  add  system  performance  level  for  the  sonar  unit.  The  classification 
decisions  of  the  sonar  operator  are  then  sent  to  the  personnel  responsible  for 
fire  control  and  command  and  control  decisions. 

When  an  operator  classifies  an  active  track,  all  samples  from  that 
track  will  be  fed  back  automatically  to  the  target  classification  subsystem 
and  used  to  Improve  the  previously  learned  parameter  values  by  the  adaptation 
process.  This  will  be  done  as  background  processing  and  will  be  effected  only 
when  all  immediate  operational  needs  have  been  met. 
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V.  RECOMMENDATIONS 


During  the  course  of  this  investigation,  certain  additional  approaches 
of  considerable  promise  were  Isolated  but  were  not  completely  evaluated 
due  to  lack  of  time.  If  further  work  of  an  exploratory  nature  is  contemplated 
for  automatic  classification,  the  authc  s feel  that  the  following  recommended 
approaches  should  be  given  consideration: 

1.  The  Polynomial  Discriminant  Method  of  Dr.  Donald  F,  Specht  (Stanford 
Electronics  Laboratories)  appears  to  have  several  strong  points  in  its  favor. 

We  feel  that  Dr.  Specht  should  be  contacted  to  learn  the  current  status  of 
his  work  on  the  PDM.  Some  of  his  work  was  done  under  Navy  contract  on  the  POSEIDON 
project  and  should  be  available  to  the  Center.  Also,  it  would  be  worth-while 
to  know  of  improvements  to  his  PDM  and  of  any  differences  between  theory  and 
the  computer  Implementation.  Dr.  Specht's  investigation  has  been  in  progress 
for  several  years  and  the  programs  developed  in  its  course  undoubtedly  reflect 
considerable  refinement.  It  would  seem  desirable  to  acquire  these  programs 
directly  and  convert  them  for  use  on  the  Center's  computers. 

2,  Work  should  be  continued  on  cluster  analysis  until  a method 
emerges  that  is  significantly  superior  and  more  reliable  than  the  other  leading 
contenders.  What  is  needed  is  not  an  "ultimate  " method  but  the  best  among 

at  least  three  promising  methods.  Those  which  currently  appear  most  promising  are: 

a.  Gradient  Method 

b.  ISODATA 

c.  An  adaptation  of  "hill  climbing  procedures"  to  find  the  maxima  of 
a PDM  polynomial. 
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3.  An  effort  should  be  made  to  introduce  man-machine  interaction 


into  the  development  and  evaluation  of  classification  methods.  This  could 
be  accomplished  by  direct  CRT  display  of  data  such  as  is  done  in  the  PROMENADE 

9 

system.  Or  the  dynamic  results  of  cluster  isolation,  learning,  and 
adaptation  procedures  could  be  displayed  on  specialized  formats.  Ideally, 
the  Investigator  would  be  able  to  modify  the  controlling  parameters  of  these 
procedures  on-line  by  console  operator  inputs.  The  programming  of  these 
procedures  would  be  for  the  general  d-dlmensional  data  with  the  display 
representing  a transformation  into  two  dimensions  or  a two-dimensional  subspace. 

There  are  three  Important  advantages  that  would  be  derived  from  the 
development  of  this  on-line  display  system:  (l  ) the  development,  debugging, 
and  evaluation  of  learning,  adaptation,  and  probability  estimation  procedures 
would  be  greatly  speeded  up,(2)  real  data  could  be  examined  in  detail  and  at 
length  to  determine  which  combinations  of  measurements  are  most  valuable  for 
distinguishing  the  Important  active  sonar  targets,  and^)  the  display  would 
provide  the  means  to  demonstrate  visually  the  strongly  Intuitive  concepts 
of  automatic  classification.  The  display  would  serve  to  reduce  the  handicaps 
of  technical  terminology  and  multi-dimensionality  by  providing  a dynamic 
representative  of  the  classification  procedures. 

4.  The  prototype  classification  subsystem  described  In  Section 
IV  of  this  note  should  be  Implemented  In  FORTRAN  and  perhaps  JOVIAL.  The 
choice  of  these  particular  languages  would  facilitate  a comparison  with 
ocher  attempts  at  automatic  classification  subsystems  designed  under 
Government  and  military  auspices. 
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5.  A method  is  needed  for  expressing  confidence  levels  in  a 
manner  which  la  both  easy  to  compute  and  meaningful  in  an  operational  situation. 
This  means  the  development  of  a method  which  provides  acceptable  results 
but  does  not  consume  computational  time  or  space  out  of  proportion  to  its 
usefulness.  The  following  is  a recommended  approach  for  developing  such  a 
measure  of  confidence: 

a.  Generate  data  which  will  be  representative  of  the  operat- 
ional situation.  This  data  would  be  generated  by  a statistical  model  where 
the  parent  distributions,  P(X)'s,  would  be  known. 

b.  Reduce  the  data  by  the  cluster  analysis  method  chosen  for 
operational  implementation. 

c.  For  each  cluster  estimate  the  probability  density^ P(X)^ 
of  the  parent  distribution. 

d.  Compute  the  error  E • P(X)  - P(X)  and  chosen  functions 

2 

of  the  error  such  as  E > |e  | , etc.,  over  a range  of  points  in  the  data  space. 

e.  Analyze  the  results  of  the  error  computations  from 
several  trials  over  a wide  range  of  the  Independent  variables  such  as 
number  of  samples,  nimiber  of  dimensions,  parameters  of  the  parent  probability 
function,  etc. 

f.  Relate  the  error  functions  to  the  independent  variables. 

A preliminary  approach  would  be  to  plot  the  error  functions  as  contours  for 
pairwise  combinations  of  independent  variables.  It  may  be  possible  from  this 
analysis  to  eliminate  the  least  sensitive  variables. 

g.  Formulate  the  relationship  between  error  and  the  selected 
Independent  variables  in  terms  of  a measure  of  confidence  level. 
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The  conclusions  which  can  be  drawn  from  the  work  expended  on  this  task 
may  be  stated  briefly  as  follows: 

1.  Existing  statistical  techniques  or  extensions  of  these  techniques  are 
presently  available  and  applicable  to  the  problem  of  automatic  target  classifi- 
cation by  computer  in  active  sonar  systems. 

2.  Implementation  of  these  techniques  into  a workable  target  classifi- 
cation subsystem  such  as  that  described  in  Section  IV  does  appear  to  be 
feasible  in  view  of  operational  and  system  constraints. 

3.  An  effort  should  be  initiated  to  refine  the  available  classification 
techniques  further  and  to  Incorporate  them  into  an  operating  subsystem  which 
is  capable  of  demonstrating  active  sonar  target  classiflcafiion  on  a real* 
time  basis. 
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APPENDIX  A 


A MEASURE  OF  TARGET  FjiLATIVE  IMPORTANCE 
Suppose  that  we  obtain  a single  ping  estimate  of  the  probability  that 
a target  belongs  to  a hostile  class  , P(o^),  and  a normalized  measure  of 
confidence  in  that  probability  estimate,  0 < C i 1.  vje  would  like  to  define 
a function,  G[P(0j^),C],  to  evaluate  the  "importance"  of  the  target  that 
satisfied  the  following  criteria: 

1.  When  C = 0,  G[P(0,  ),C]  = k.  When  we  have  no  confidence  in  the 

h 

probability  estimate,  all  targets  are  equally  important. 

2.  When  C ■■  1 and  P(0,  ) =■  1,  G is  a maximum.  The  most  important 

n 

target  Is  the  one  that  we  are  positively  sure  is  hostile. 

3.  When  C « 1 and  P(0.  ) = 0,  G is  a minimum.  The  least  Important 

n 

target  is  the  one  that  we  are  positively  sure  in  not  hostile. 

A function  which  satisfies  the  above  requirements  is 
G * k +(P  - k)C,  0 < k < 1 


Criterion 

G 

P 

C 

1 

k 

any 

0 

2 

max 

1 

1 

3 

min 

0 

1 

The  choice  of  k determines  how  important  the  lack  of  confidence  really  is. 
This  would  suggest  that  k could  be  adjusted  dynamically;  that  is,  be  kept 
small  if  C Is  frequently  small.  The  objective  here  would  be  to  make  G 
sensitive  to  either  P or  C depending  upon  which  variable  is  better  suited 
to  enable  targets  to  be  ranked  according  to  their  importance. 
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APPENDIX  B 


MEASURES  OF  CLUSTER-SEEKING  PERFORMANCE 
Well  Separated  Clusters.  In  the  case  where  there  are  N well-separated  or 

"true"  clusters  C^,C2» • • • tC^, . . . ,C^  which  are  discerned  as  M "apparent" 

clusters  K^, . . . ,Kj , . . . ,K^ ^the  number  of  samples  associated  with  the  1^^  true 

cluster  and  the  apparent  cluster,  can  be  recorded  In  a frequency 

matrix  F>(f.  ,)NxM.  The  mutual  information,  I,  has  a maximum  value  In 

this  application  when  all  samples  are  correctly  associated  with  the  original 

N clusters,  and  It  Is  zero  when  the  apparent  clusters  are  totally  confused; 

M 

1.  e. , when  Z f Is  pro'-^iclonal  to  the  number  of  samples  In  each  C . 

J-1  ^ 


I = H(C)  + H(K)  - H(  r * K'l 
H(C) 


(B-1) 


E F In  F ; F - E f 
1-1  i j-1  IJ 


(B-2) 


E F In  F ; F - E f 
1.1  J J J 4.1 


(B-3) 


N M 

H(C  * K)  - I z f In  f 
1-1  j-1  J 


(B-4) 


The  purpose  of  uslrig  H(C)  In  the  denominator  Is  to  normalize 

0 £ I i !•  A cluster-seeking  method  should  score  close  to  L If  Its  logic  has 
been  programmed  correctly  and  Its  logic  Is  correct  to  begin  with. 


ir  _ 


In  the  general  case,  randomly  generated  clusters  will  overlap  in  varying 
degrees  so  that  the  system  of  apparent  clusters  may  not  always  be  compared  to 
the  true  clusters  as  In  the  well-separated  case.  Therefore,  an  additional 
effort  would  have  to  be  made  to  reward  the  resolution  of  overlapping  clusters 
and  take  for  granted  the  discernment  of  Isolated  clusters.  A convenient 
measure  of  the  separation  between  two  clusters  Is  Sebestyen's  interset  distance, 
S[(X^,  )„1,  a mean  squared  distance  computed  as  below: 


m n d 

j^i  k^i  1^1  ^*ij"yik^‘ 


(B-5) 


d 2 **2 

Z X + Z y 
1-1  ^ 1-1  ^ 


- 2 Z X y 
1-1  ^ ^ 


We  may  now  define  a normalized  measure  of  the  "resemblance"  R between  two 


clusters: 


R-a"®;0£R<.l;a>0 


Designating  the  average  resemblance  of  the  1 true  cluster  to  the  other 
true  clusters  by  ve  can  define  a measure  of  significant  performance 


(B-6) 


N M 

1 - Z z f R ; 
1-1  1-1  ^ 


0 <.  I <.  1 


(B-7) 
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I Method  A 


where  T Is  the  total  number  of  samples.  This  function  is  not  at  all  similar 
to  the  I suggested  for  the  well-separated  case.  Here,  the  apparent  clusters 
may  agree  with  the  true  clusters  perfectly,  but  achieve  I > 0 if  the  true 
clusters  are  trivially  well  separated.  High  scores  can  only  be  achieved 
when  overlapping  true  clusters  are  correctly  identified.  The  indicated  use 
of  this  general  measure  is  to  compare  two  methods,  perhaps  by  a scatter 
diagram  as  Illustrated  below. 
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APPENDIX  C 


MISSING  MEASUREMENTS 

In  the  case  where  all  d-dimensions  are  observed, use  Is  made  of  a stored 

determinant  of  the  sample  covariance  matrix  and  its  inverse  to  compute 

d d 

1 

1 1 


P(X) 


(C-1) 


where  u^^  is  an  element  of  U ^and  is  distance  of  the  i^''  measurement 

with  respect  to  the  1^^  sample  mean.  However,  if  measurements  are  missing, 

they  can  be  Ignored  in  the  computation  of  the  exponent;  namely,  the  program 

will  oillt  terms  for  which  1 or  j corresponds  to  a missing  measurement, 
d/  2 

The  term  (2-n)  can  simply  be  obtained  from  a table  indexed  by  d. 

The  new  determinant,  however,  does  present  a oomtxitatlonal  problem.  There 
appear  to  be  theee  possible  approaches: 

1.  Compute  the  determinant  of  the  remaining  matrix  when  the  rows 
and  columns  corresponding  to  the  missing  measures  are  removed.  This  is 
straightforward,  but  time-consuming  and  potentially  redundant. 

2.  Reduce  the  PCtll"  determinant  by  making  use  of  sCfired  minors. 

This  works  well  for  one  missing  measure,  but  becomes  awkward  thereafter. 

It  may  still  be  the  most  expedient  approach  in  the  last  analysis. 

If  k is  the  only  missing  measure,  the  desired  determinant  is 
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(C-4) 


The  terms  In  parentheses  could  easily  be  stored  for  each  measure. 

3.  Estimate  the  new  determinant  as  a function  of  |u|  and  correlations 
among  the  variables.  This  approach  has  not  been  explored,  but  appears 
to  be  promising. 
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APPENDIX  D 


GRADIENT  OF  MULTIVARIATE  NORMAL  DISTRIBUTION 


With  sample  size  and  number  of  dimensions  held  constant,  it  is  reasonable 
to  assume  that  the  error  in  the  estimated  probability  will  be  proportional  to 
the  gradient  at  the  point  X • (Xj^, . . . ,x^) , that  is,  the  maximum  rate  at  which 
P(X)  is  changing  at  X. 


|u| 


C exp  (-  -oQ) 


1 ^ ^ 


(D-1) 


To  differentiate  with  respect  to  the  k variable  (regarding  the  remaining 
d-1  variables  as  constant),  Q can  be  rewritten  as 


d d 


The  partial  derivative  with  respect  to  Xj^  is  obtained  in  the  following 
steps: 


(D-2) 
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JA  ■‘Aj 


(D-4) 


dP(X) 


dP-(X)  ^ (D-5) 

dQ  ’ dx^ 


■'‘VN  •‘Aj*  “’’  ‘‘  1 


■ "J  "kj’  “‘‘’  ® 

Finally,  the  absolute  value  of  the  gradient  Is  the  root  of  the  sum  of  the 
squares  of  the  d partial  derivatives. 
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The  gradient  Is  therefore  the  product  of  two  terms,  one  of  which  Is  P(X). 
The  gradient  thus  goes  to  zero  as  P(X)  goes  to  zero.  The  gradient  Is  also 
zero  when  X = 0.  It  can  be  verified  that  the  gradient  Is  at  a maximum  at 
one  standard  deviation  from  the  mean  In  any  direction. 
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